-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filtering to fieldcaps endpoint #83636
Conversation
Pinging @elastic/es-search (Team:Search) |
Hi @romseygeek, I've created a changelog YAML for you. |
Pinging @elastic/clients-team (Team:Clients) |
- is_true: fields.misc\\.keyword | ||
|
||
--- | ||
"Field type filters": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test case for multiple values in the fields
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started reviewing this a while ago, then I got sidetracked, but I eventually got back to it. I left some comments that I think we should address, even if this PR has already been merged.
@@ -77,6 +77,15 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=index-ignore-unavailab | |||
(Optional, Boolean) If `true`, unmapped fields are included in the response. | |||
Defaults to `false`. | |||
|
|||
`filters`:: | |||
(Optional, string) Comma-separated list of filters to apply to the response. The | |||
following filters are supported: +metadata,-metadata,-parent,-nested,-multifield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the +
notation supported only for metadata? Would it make sense to expand on what each one of these mean, with a link to the relevant docs explanation for each one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll expand the docs. The idea is that -
and +
work in much the same way they do in our query parsers = -
means 'must not match', and +
means 'must match'. So you can ask for only metadata, or to exclude metadata. For the others it doesn't make much sense to do anything except exclude them, with the exception of parent
where it sort of makes sense to say 'include parent data as well as everything else', so a bare parent
term is supported.
server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilitiesFetcher.java
Show resolved
Hide resolved
following filters are supported: +metadata,-metadata,-parent,-nested,-multifield | ||
|
||
`types`:: | ||
(Optional, string) Comma-separated list of field types to include. Any fields that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we link the docs page with all of the field types here? Maybe also include an example at the end of the sentence?
`types`:: | ||
(Optional, string) Comma-separated list of field types to include. Any fields that | ||
do not match one of these types will be excluded from the results. Defaults to empty, | ||
meaning that all field types are returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question, just to double check: types:keyword means either runtime or indexed field, no distinction between the two for field_caps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keyword runtime fields return KeywordFieldMapper.CONTENT_TYPE
as their family name (which is what is actually checked here). I also wonder if it's worth adding a -runtime
, +runtime
pair of filters. I know you don't like making it too obvious what is runtime and what isn't, but I think it can still be helpful for clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's wait for them to ask? :)
if ("-parent".equals(filter)) { | ||
return false; | ||
} | ||
if ("parent".equals(filter)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this one documented: is it +parent
or just parent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just plain parent
and acts as the equivalent of a SHOULD
clause - in other words, do return parent information. Clients that aren't filtering out nested or multifield entries still use this to do their own checks, but once they start using filters they can filter out parents instead. At some point in the future I envisage that we will filter out parents by default, and then clients will have to explicitly include parent
if they still want that information.
server/src/main/java/org/elasticsearch/action/fieldcaps/ResponseRewriter.java
Show resolved
Hide resolved
@@ -336,6 +336,9 @@ public NestedLookup nestedLookup() { | |||
} | |||
|
|||
public boolean isMultiField(String field) { | |||
if (fieldTypeLookup.isRuntimeField(field)) { | |||
return false; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that there are existing callers of this method in FieldMapper. Was it not accurate before? I get thrown off that we look at field mappers later, and runtime fields don't have field mappers, so why do we need this additional if and what effect does it have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used when validating copy_to
fields, in which case we don't really care about runtime fields. But it is important for field cap filtering, because otherwise we might incorrectly a runtime field foo.bar
as a multifield of an otherwise existing field foo
.
Thinking about it though, I reckon we can do this without looking at runtime fields just by checking to see if both the passed-in field name and its parent are field mappers. Let me see what I can come up with.
instance.runtimeFields() | ||
); | ||
} | ||
default -> throw new IllegalStateException("The test should only allow 7 parameters mutated"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing a serialization tests that uses multiple versions, to exercise the bwc layer.
) | ||
); | ||
assertThat( | ||
request.getDescription().length(), | ||
lessThanOrEqualTo(1024 + ("indices[" + "s9999,... (9999 in total, 9999 omitted)], fields[a]").length()) | ||
lessThanOrEqualTo(1024 + ("indices[" + "s9999,... (9999 in total, 9999 omitted)], fields[a], filters[], types[]").length()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too, I think we are missing a serialization tests that uses multiple versions, to exercise the bwc layer.
server/src/main/java/org/elasticsearch/index/mapper/FieldTypeLookup.java
Show resolved
Hide resolved
Thanks for all the comments @javanna, I will open a follow-up PR to address them. |
* adds a test for mixed cluster requests * fixes a bad stream version check (above test will fail if this isn't included) * replaces private FieldCapsFilter interface with Predicate * renames 'allowedTypes' to 'types' to maintain consistency with external API * adds javadoc to ResponseRewriter * removes isRuntimeField from FieldTypeLookup Relates to #83636
Many consumers of the field caps API need to do some post-processing of the
results before they can use them; for instance, Kibana would like to exclude
multifields from certain field selections, or would like to display only
geo_point
fields in Maps. ML and QL consumers exclude nested fields in certain
circumstances. This post-processing is possible at the moment, but can be
hacky; and in all cases it involves sending the whole (possibly very large) field
caps response over the wire and then whittling it down in the client. It is also not
guaranteed to be accurate - runtime fields may be incorrectly classified as multifields,
for example.
This commit pushes filtering into elasticsearch itself, reducing the amount of data
that needs to be transported and ensuring better accuracy. The field caps API gets
two new parameters:
filters
- a comma-delimited list that may contain any combination of:+metadata
,-metadata
,-nested
,-parent
,-multifield
types
- a comma-delimited list of field types; only fields that have a type in this setwill be returned
The API will make best-effort attempts to apply the filters post-hoc to responses from
older nodes, so this should still work in a mixed-cluster or cross-cluster situation.
Fixes #82966, #72174